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ABSTRACT 

We present a technique to identify optical counterparts of 250 /im-selected sources from the 
Herschel-AilLAS survey. Of the 6621 250 /im > 32mJy sources in our science demonstra- 
tion catalogue we find that ~ 60 percent have counterparts brighter than r — 22.4 mag in 
the Sloan Digital Sky Survey. Applying a likelihood ratio technique we are able to identify 
2423 of the counterparts with a reliability B > 0.8. This is approximately 37 percent of the 
full 250 /im catalogue. We have estimated photometric redshifts for each of these 2423 reli- 
able counterparts, while 1099 also have spectroscopic redshifts collated from several different 
sources, including the GAMA survey. We estimate the completeness of identifying counter- 
parts as a function of redshift, and present evidence that 250 /im-selected Herschel-ALTLAS 
galaxies have a bimodal redshift distribution. Those with reliable optical identifications have 
a redshift distribution peaking at z w 0.25 ± 0.05, while sub-mm colours suggest that a sig- 
nificant fraction with no counterpart above the r-band limit have z > 1. We also suggest a 
method for selecting populations of strongly-lensed high redshift galaxies. Our identifications 
are matched to UV-NIR photometry from the GAMA survey, and these data are available as 
part of the Herschel-AiTLAS public data release. 

Key words: Galaxies: Local, Galaxies: Infrared, Galaxies: Star-forming, Methods: Statisti- 
cal, Submillimetre: Galaxies 



1 INTRODUCTION 

One of the key problems to overcome when conducting multi- 
wavelength surveys is determining which sources are associated 
with one another in different wave-bands, and which are unrelated. 
When multiple observations have been conducted at similar wave- 
lengths and with similar resolution and sensitivity, this problem can 
be reliably addressed by using a simple nearest-neighbour match. 
However, in the situation where the two distinct sets of observations 
to be matched have considerably different resolution - for example 
matching far-infrared or sub-millimetre survey data to an optical 
catalogue (e.g. Sutherland et al., 1991, Clements et al., 1996, Ser- 
jeant et al., 2003, Clements et al., 2004, Ivison et al. 2005, 2007, 
Wang & Rowan-Robinson, 2009, Biggs et al. 2010) - the large 
positional uncertainties in the longer-wavelength data can make 
it much more difficult to find reliable associations between sub- 
millimetre sources and their optical/near-infrared counterparts. 

One method which can be used to identify the most likely 
counterpart to a low-resolution source, is the Likelihood Ratio tech- 
nique (hereafter LR), first suggested by Richter (1975), and ex- 
panded by Sutherland & Saunders (1992) and Ciliegi et al. (2003). 
The crucial advantage of the LR technique over other methods is 
that it not only uses the positional information contained within the 
two catalogues, but also includes brightness information (both of 
the individual potential counterparts, and of the higher resolution 
catalogue as a whole) to identify the most reliable counterpart to a 
low-resolution source. 

The Hersche! Astrophysical Terahertz Large Area Survey 
(Herschel-KTh AS, Bales et al., 2010) is the largest open-time key 
project that will be carried out with the Herschel Space Observa- 
tory (Pilbratt et al., 2010). The Herschel-AILAS will survey in 
excess of 550 deg^ in five channels centred on 100, 160, 250, 350 
and 500 fim, using the PACS (Poglitsch et al., 2010) and SPIRE in- 
struments (Griffin et al., 2010). This makes Herschel-ATLAS cur- 
rently the largest area extragalactic Herschel survey. The Herschel- 
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ATLAS observations consist of two scans in parallel mode reach- 
ing 5cr point source sensitivities of 132, 126, 32, 36 and 45 miy 
in the 100 /im, 160 /im, 250 /tm, 350 /im and 500 /tm channels re- 
spectively, with beam sizes of approximately 9, 13, 18, 25 and 35 
arcsec in the same five bands. The SPIRE and PACS map-making 
procedures are described in the papers by Pascale et al. (2010) and 
Ibar et al. (2010), while the catalogues are described in Rigby et 
al. (2010). One of the primary aims of the Herschel-AThAS was 
to obtain the first unbiased survey of the local Universe at sub-mm 
wavelengths, and as a result the survey was designed to overlap 
with existing large optical and infrared surveys. 

In this paper, we present a discussion of our implementation 
of the LR technique to identify the most reliable counterparts to 
250 /im-selected sources in the Herschel-ATLAS science demon- 
stration phase (SDP) data field (Eales et al., 2010). This field was 
chosen in order to take advantage of multi-wavelength data from 
the Sloan Digital Sky Survey (SDSS - York et al., 2000), and the 
UK Infrared Deep Sky Survey Large Area Survey (UKIDSS-LAS 
- Lawrence et al., 2007). This field also overlaps with the 9 hour 
field of the Galaxy And Mass Assembly survey (GAMA - Driver 
et al., 2010). The GAMA catalogue (Hill et al., 2011), comprises 
not only thousands of redshifts (for galaxies selected as described 
in Baldry et al., 2010, and observed with the maximum possible 
tiling efficiency - Robotham et al., 2010), but also r-band-defined 
aperture-matched photometry in the ugrizYJHK bands. In addi- 
tion, the GAMA fields are being systematically observed using the 
Galaxy Evolution EXplorer (GALEX) satellite (Martin et al., 2005) 
at Medium Imaging Survey depth to provide aperture-matched 
FUV and NUV counterparts to the catalogued GAMA sources (the 
GAL£X-GAMA survey; Seibert et al., in prep). These counterparts 
will potentially be of great scientific value once the most reliable 
optical counterpart can be established for each Herschel-AThAS 
source. 

In section |2] we present the specific LR method that we have 
used to identify counterparts to 250 /tm-selected sources from the 
Herschel-ATLAS SDP catalogue in an r-band catalogue of model 
magnitudes derived from the SDSS DR7. In section[3]we present 
the redshift properties of our catalogue, which covers ~ 16 deg^ 
over the GAMA 9 hour field. Section|4]contains some basic results 
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based on our reliable catalogue, and in section [5] we present some 
concluding remarks about the likelihood ratio technique and the 
resulting catalogue. 



2 CALCULATING THE LIKELIHOOD RATIO 

The likelihood ratio, i.e. the ratio between the probability that the 
source is the correct identification and the corresponding probabil- 
ity for an unrelated background source, is calculated as in Suther- 
land & Saunders (1992): 



L = 



<l{-m)f{r) 
n{m) 



(1) 



in which n{m), and q{m) correspond to the SDSS r-band mag- 
nitude probability distributions of the full r-band catalogue and of 
the true counterparts to the sub-millimetre sources, respectively, 
while f{r) represents the radial probability distribution of offsets 
between the 250 ^im positions and the SDSS r-band centroids. We 
will now describe how we calculate each component of this rela- 
tionship in turn. 



2.1 Calculating the radial dependence of the likelihood ratio, 

fir) 

Here, /(r) is the radial probability distribution function of the posi- 
tional errors as a function of the separation from the SPIRE 250 ^m 
position in arcseconds (r), given by: 



fir) = 



27rcr2^s 



exp 






(2) 



where r is the separation between the 250 pim and r-band posi- 
tions, and CTpos is the standard positional error (which is assumed 
to be isotropic). 

For Herschel-KIhAS SDP observations, it was necessary to 
determine the SPIRE positional uncertainties. Since this informa- 
tion was not available a priori, we empirically estimated apos us- 
ing the SDSS DR7 r-band catalogue positions, assuming that the 
SDSS positional errors were negligible in comparison to the SPIRE 
errors. To determine (Jpos, we derived histograms of the separations 
between the positions in the MAD-X SPIRE catalogue (Rigby et 
al., 2010) of the 5(t 250 ^m sources, and all of those objects in 
the r-band SDSS DR7 catalogue within 50 arcsec, doing this for 
both the North-South and East-West directions (Figure [T]!. These 
histograms can be well-described as the sum of the Gaussian posi- 
tional errors plus the clustering signal for SDSS sources convolved 
with Gaussian errors, G{9, a), with a — (Jpos: 



n{x) = G' 



'{x, o-pos) + ^ w{9) * G{6, (jpo 
\ y 

n{y) = G'{y, apos) + ( ^ w{e) * G{e, apos) J 



(3) 



(4) 



where w{6) = AO" , with 9 being measured in degrees for the 
purposes of comparison with the literature. We determined the 
values of A and S empirically based solely on galaxies in the 
SDSS catalogue over > 35 deg^ centred on the //er.?c/!e/-ATLAS 
SDP field (limited to r < 22.4), with the best fit parameters 



A = 6.89 ± 0.90 X 10"^ and 5 = 0.689 ± 0.069, in reasonable 
agreement with the values of Connolly et al., 2002. The effects of 
clustering (i.e. to(0) * G{9, apos)) are shown in the top panel of 
Figure[T] 

In order to determine the la positional error of the 250 /im se- 
lected catalogue, we conducted a simple x^ fit of our model (equa- 
tions [3] & |4ll to the histograms. The results are shown in Figure [T] 
for the summations in the East- West and North-South directions in 
the middle and bottom panels respectively. The clustering signal is 
shown in the bottom two panels by the dotted lines, with the his- 
tograms and their Poisson error bars overlaid with the best fit model 
(solid lines). The la positional errors were found to be 2.49 ±0.10 
arcsec and 2.33 ± 0.09 arcsec in the two directions, consistent with 
one another within the errors. The advantages of this method are 
two-fold; firstly, it is not necessary to identify the counterparts to 
the 250 fim sources a priori, and secondly, the centroids of the best 
fit Gaussians may be used to determine astrometric corrections in 
the SPIRE maps (e.g. Pascale et al., 2010). The value for apos that 
we adopted was the weighted mean 2.40 ± 0.09 arcsec. 

Theoretically, the positional uncertainty should depend on the 
signal-to-noise ratio (SNR) of the detection and on the full- width at 
half maximum (FWHM) of the SPIRE 250 ^m beam (18.1 arcsec, 
Pascale et al., 2010), following the results derived in Ivison et al. 
(2007; ath ~ 0.6 ^^^'^ ) and assuming the case of uncorrelated 
noise. We use our empirical results in Figure[T]to calibrate the the- 
oretical relation presented in Ivison et al. (2007) to our data, and 
assume that our results are symmetric in RA and Dec. This leads 
us to introduce a factor of 1.09 to give equation[6l 



Tpos — 1.09 X ath 

FWHM 



(5) 
(6) 



Although the SNR of some SPIRE 250 /im sources is very 
high, it is unphysical to allow apos in equation|6]to approach zero 
for three main reasons: 

• Whilst it is acceptable to neglect the SDSS DR7 positional 
errors for the purposes of determining apos (section lZH . the astro- 
metric precision for sources in the SDSS DR7 catalogue is non-zero 
(< 0.1 arcsec - Abazajian et al., 2009). 

• Large sources, especially those without Gaussian surface 
brightness profiles (e.g. bright spiral galaxies), have considerably 
larger positional uncertainties associated with them. 

• Confusion provides a lower limit to the positional errors of 
the SPIRE catalogue, although the SNR in equation|6]does include 
confusion noise as described in Rigby et al. (2010) and Pascale et 
al (2010). 

Other effects that can influence the positional uncertainty include 
imprecise knowledge of the beam morphology and the effects of 
drifts and jitter in the Herschel pointing model. 

To account for these effects, we do not allow the positional un- 
certainty to fall below 1 arcsec, and we also include a term which 
adds 5 percent of the SDSS r-band isophotal major axis in quadra- 
ture to the value determined by equation |6] for those sources with 
r-band model magnitudes < 20.5. Finally, /(r) must be renor- 
malised so that 
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Figure 1. In order to derive the 1 a positional errors of tlie SPIRE 250 fim- 
selected sources, we produced histograms of the total number of SDSS 
sources within a box 50 arcsec on a side around the SPIRE 250 /im cen- 
tres. After accounting for the clustering of SDSS sources (the top panel 
shows the signal expected for the clustering of SDSS sources in the RA and 
Dec directions convolved with Gaussian positional errors - the results are 
shown as solid and dashed lines for RA and Dec, respectively, and these 
results appear as the dotted lines in the bottom two panels), we can add 
in an appropriate Gaussian distribution of centres to account for the actual 
positions of the SPIRE sources (equations[3]&|4)- Performing ax^ minimi- 
sation allows us to then empirically determine the la positional uncertainty 
for these sources, which are shown as a^\ and croec- 



2.2 Calculating the magnitude dependence of the likelihood 
ratio 

Calculating the LR requires two further pieces of magnitude infor- 
mation, n{m) and qirn). The quantity n(m) is simply the proba- 
bility that a background source is observed with magnitude m. To 
estimate this, we calculate the distribution of SDSS DR7 r-band 
model magnitudes for all of the primary photometry sources in the 
catalogue, normalised to the total area of the catalogue (which is 
approximately 36.0 deg^ for the SDSS catalogue that we use for 
this purpose). 

The non-triviality lies in the calculation of q{m) - the prob- 
ability that a true counterpart to a 250/im source has a magnitude 
m. To estimate this we calculate the r-band magnitude distribu- 
tion of the counterparts to the 250 /im sources using the method 
of Ciliegi et al. (2005). This method involves counting all objects 
in the optical catalogue within some fixed maximum search radius 
(?"max) of the SPIRE positions. To avoid influencing the results of 
this analysis with erroneous deblends in the SDSS DR7 catalogue 



(which artificially alter the number counts), we eyeballed the SDSS 
r-band images of each of the 5(j 250 /xm sources, removing 370 
SDSS sources from the input catalogue. The magnitude distribu- 
tion of the remaining objects is referred to as total(77i). Here we 
have adopted rnmx = 10 arcsec, which encloses >99.996% of the 
real counterparts to the 250 ^.m sources based on our derived value 
for (Jpos- The distribution total(7Ti) is then background-subtracted 
to leave the magnitude distribution of excess sources around the 
250/im centres, real(m): 



real(m) = [total(m) — [n{m) x A''ccntrcs x tt x r^ax)] , (8) 

where A'ccntrcs is the number of 250 /im sources in the catalogue. 
This enables us to empirically estimate q(m) from the sources 
in our optical catalogue rather than modelling the r-band mag- 
nitude distribution of 250 /im-selected Herschel-AThAS sources. 
The distribution q{m) is given by equation|9l 



q{m) = 



real(m) 
E™ real(rre) 



(9) 



Qo is the fraction of true counterparts which are above the SDSS 
limit, and is calculated thus: 



Oo 



N, 



matches 



(E. 



^ n{m) X vrr^ax x A^contrcs) 



TVc, 



;cntrcs 



(10) 



here A'^matchcs represents the number of possible IDs within 10.0 
arcsec of the SPIRE positions, and A^centres is defined as above. 
Since the value of Qo will be different for galaxies and unresolved 
sources in our catalogue, we must calculate q(m), n{m) and Qo 
separately for each population. 

We separate resolved and unresolved sources using a slightly 
modified version of the GAMA colour-colour relation from Baldry 
et al. (2010, modified such that Asgjk > 0.40 rather than 0.20 to 
avoid adding an unphysical sharp edge to the stellar locus in Figure 
|2ll. Having separated the two populations, we corrected the posi- 
tions of the unresolved sources for known proper motions in the 
USNO/SDSS DR7 catalogue (Munn et al., 2004), precessing their 
co-ordinates to the epoch of the Herschel-KIhAS SDP observa- 
tions. Only those unresolved sources with proper motions detected 
at a SNR ^ 3 were updated. 

For our SDSS DR7 r-band catalogue, Qg^' = 0.583, i.e. 58.3 
percent of the galaxy counterparts are brighter than our magnitude 
limit. For the unresolved sources the value is QJJ"'^"'' — 0.010, in- 
dicating that only 1 percent of the unresolved sources in the cata- 
logue are detected at ^ 5(7 in our 250/im data (although see sec- 
tion |2XB . Thus we determine that overall Qo = Ql"^ + Qo'''"' = 
0.593. 

The distributions of q{m), and n{m) (as well as the magni- 
tude dependence of the LR - q{m)ln{m)) are shown in Figure[3] 
in which the left and right columns show the values for the resolved 
and unresolved sources, respectively. While the q{ni) distribution 
for galaxies is well-sampled at r> 14 mag, we assume that q(m)/ 
n{m) is constant for all sources brighter than this, enabling us to 
use our well-defined n{m) to estimate q{m) for the brightest galax- 
ies. 

Since the fraction of //eric/je/-ATLAS sources associated 
with unresolved counterparts is low (reflected in Qo"'^''^ = 0.010), 
the method used to determine q(m) for these sources differs. In 
order to ensure that the LR results for stars/QSOs are not domi- 
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Figure 3. Deriving tlie magnitude dependence of tlie LR for tlie resolved and unresolved counterparts to the 250/jm catalogue (left and right columns, respec- 
tively). The analysis for the resolved sources is discussed first, while the alternative procedure followed for the unresolved sources is described subsequently. 
Top Left: Total(m) (blue, dotted) represents the SDSS DR7 r-band model magnitude distribution of all the resolved sources that lie within 10.0 arcsec of the 
SPIRE 250 fim centres. The black histogram represents the number of galaxies that we would expect within these search radii due to the background SDSS 
number counts alone. The red histogram, dubbed real(m) as per Ciliegi et al. (2003), is the difference between the two, i.e. the SDSS DR7 r-band model 
magnitude distribution of the excess sources above the background. IVIiddle Left: The ratio of q{m)/n{m) represents the magnitude dependence of the LR. 
To avoid having a zero probability of a given source being the real counterpart due to our limited statistics on real(m) (and hence q{m)) at bright magnitudes, 
we use the ratio of q{m)/n{rn) in the brightest well-sampled bin (r'mag = 14.2) to define the values of q(m) for resolved sources with SDSS r-band 
magnitude Sj 14.0. Bottom Left: The resulting q{m) distribution - our best estimate of the probability that a true counterpart to a 250 /xm source has a mag- 
nitude m - using the n{m) distribution to overcome the small number statistics at bright r-band magnitudes. To further reduce the effects of noise, we boxcar 
smooth the q{m) distribution for resolved sources with a 3 bin kernel. Qo for the resolved sources is determined to be 0.583. For the unresolved sources 
(right column), which have considerably fewer excess sources (reflected in the lower value of Qq"''°''), the corresponding analysis is slightly different. We 
define the magnitude dependence of the LR - q{m)/n{m) - assuming a constant q{m) (bottom panel, right column) normalised to reflect Qq'^'"^'^ = 0.010. 
This situation will improve with the higher quality statistics that the full Herschel-ATLAS catalogue will produce. 
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Figure 2. Colour-colour diagram for sources in the GAMA catalogue. We 
use the relationship of Baldry et al. (2010) to distinguish between unre- 
solved sources (stars and QSOs) and galaxies. Those sources with stellar/ 
QSO colours in the SDSS/LAS catalogue data over the GAMA 9 hour field 
ai'e displayed in red, while those with galaxy colours ai'e displayed in black. 
Objects with colours consistent with QSOs are located toward the upper 
left comer of this plot. Of the five R ^ 0.8 sources in our catalogue clas- 
sified as unresolved, we find that three satisfy the GAMA colour selection 
criteria for being stellar, and so are potentially evolved stars, dust-obscured 
QSOs or debris disk candidates possibly indicative of a proto-planetary sys- 
tem (e.g. Thompson et al., 2010). The dashed line describes the first order 
star-galaxy separation locus (for more details see Baldry et al., 2010). The 
star-galaxy separation locus has been modified slightly from the Baldly et 
al. value due to the fainter magnitudes considered in our survey. 



nated by small number statistics, we assume a flat prior on q(m), 
normalised to retain QS'"''" = 0.010 (figure[3l(. 

We can correct our value for Qo for the clustering of SDSS 
sources by simply dividing Qo by 1 + /n '"^'^^'^'^ ■w{9)d9 — 1.0008 
(remembering that 6 is measured in degrees), giving a clustering- 
corrected value of Qo = 0.592. This value is broadly consistent 
with the recent results of Dunlop et al. (2010), who recover optical 
counterparts to 8 out of 20 250 /im sources brighter than 36 mjy 
in data from the BLAST observations of the GOODS-South field 
to a comparable i-band magnitude (albeit with lower angular res- 
olution at 250 /im and much more sensitive optical, infrared and 
radio data), while Dye et al. (2009) found 80 counterparts to the 
175 BLAST 250/^m sources brighter than 55 mJy down to similar 
magnitude limits in r- or 7?-band data (S. Dye, private communi- 
cation). 

To account for the fact that an //er.st7ie/-ATLAS source may 
have more than one possible counterpart, we also define a reliability 
Rj for each object j being the correct counterpart out of all those 
counterparts within rmax, again following Sutherland & Saunders 
(1992): 



R. 



L, 



E.i. + (i 



(11) 



where the LR values have been determined for the resolved and 
unresolved counterparts separately (see Figure|4j. The reliability is 
a key statistic; we recommend using only those counterparts with 
reliability R ^ 0.8 for analysis, since this ensures not only that the 
contamination rate is low (see below), but also that only one r-band 
source dominates the far-infrared emission (as required for e.g. de- 
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Figure 4. Top: Histogram of the Likelihood Ratio values for all of the 7230 
potential counterparts to the 6621 5a sources in the SPIRE SDP catalogue. 
The LR values for the resolved sources (i.e. galaxies) are shown as the sohd 
histogram, with unresolved sources shown as the dotted histogram. Middle: 
Reliabilities for each counterpart. Once more, the solid histogram repre- 
sents the resolved sources, while the dotted histogram represents the unre- 
solved sources. There are a total of 2423 sources which have a reliability 
^0.8, of which five are unresolved using the star/galaxy separation criteria 
of Baldry et al. (2010). Bottom: The variation of the reliability as a func- 
tion of the likelihood ratio. This is not a linear relation since some sources 
have more than one counterpart with a high likelihood ratio. There are 263 
SDSS r-band sources with reliability < 0.8 but L > 1.63 (the value above 
which R > 0.8 for a single counterpart within the 10.0 arcsec maximum 
search radius). These may be interacting systems, as discussed in section 
14.31 These sources also demonstrate a possible hmitation of the LR method, 
since the method imphcitly assumes that there is only one true counterpart 
to a given 250 lira source. 



riving spectral energy distributions for 250 /.im-selected galaxies in 
the Herschel-AThAS catalogue. Smith et al. in prep). This is more 
conservative than other works in the literature (e.g. Chapin et al., 
2010), where the chosen LR limit was defined based on a 10 per- 
cent sample contamination rate. 

In order to estimate the number of false IDs in our reliable 
sample, we calculate: 



iV(false) = ^ (1 - 7?). 

H^O.8 



(12) 



As a result we expect 103 false IDs in our sample, which cor- 
responds to a contamination rate of 4.2%. For those investiga- 
tions in which it is desirable only to determine whether an optical 
source is associated with an Herschel-ATLAS object (with addi- 
tional caveats about lensed sources and the de-blending efficiency 
in the optical catalogue), it is sufficient to use a likelihood ratio cut 
(e.g. L > 5.0, i.e. the source is 5 times more likely to be associated 
with the sub-millimetre object than it is to be a chance superpo- 
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Table 1. The distribution of the number of SDSS r-band sources within 
10.0 arcsec of the 250 /jm positions, and the fraction of reliable counter- 
parts. There are 2869 sources with only one possible match within 10.0 
arcsec, and yet only 1389 of these are determined to be reliable; the vast 
superiority of the LR technique over a simple nearest-neighbour algorithm 
is evident. 



A''(matches) 


A''(250 /^m sources) 


N(R ^ 0.8) 


% 





1865 






1 


2869 


1389 


48.4 


2 


1400 


782 


55.9 


3 


400 


210 


52.5 


4 


76 


38 


50.0 


5 


9 


4 


44.4 


6 


2 





0.0 


TOTAL: 


6621 


2423 


36.6 



sition of sources). This aspect of the likehhood ratio technique is 
discussed in more detail in section |431 

In Table [T] we present the number of possible optical coun- 
terparts within 10.0 arcsec of the 250 /^m sample, including the 
relative fractions of reliable associations. Only half of the 250 /im 
sources with a single optical counterpart within the search radius 
are deemed reliable. 

To estimate the fraction of 250 /im sources with a counterpart 
above our detection limit recovered as having R ^ 0.8, we as- 
sume that 250 /^m-selected SDSS sources cluster in the same way 
as SDSS r-band~selected sources (the results of Maddox et al., 
2010, suggest that this assumption is reasonable). Under this as- 
sumption, we may calculate the completeness, r], of the reliable 
sources in our sample: 



n = 



n{R > 0.8) 
n(250/im > 5(t) 



(13) 



We have reliably identified rj =61.8 percent of the optical counter- 
parts bright enough to be detected in the SDSS r-band catalogue. 
This constitutes an overall identification rate of 36.6 percent for 
^ 5a 250 ^m sources in the Herschel-AIhAS SDP observations. 



2.3 Checking the identification process 

By selecting sources at 250 /xm rather than longer wavelengths, the 
negative fc-correction that results in e.g. 850 ^m-selected galaxies 
residing at a median redshift of z ~2 (Chapman et al., 2003) has 
a much less dramatic effect, and observations have shown that a 
significant fraction of 250 ^m-selected galaxies reside at z < 1 
(e.g. Chapin et al., 2010, Dunlop et al., 2010, Dye et al., 2010). 
As a result, their optical counterparts will be much brighter than 
850 ^m-selected sources, and therefore readily detectable by shal- 
lower optical/near-infrared imaging with a much lower source den- 
sity. For the SDSS DR7 r-band source catalogue that we use for the 
purposes of this investigation, we expect only ~ 0.48 background 
sources within the 10.0 arcsec search radius, down to the magni- 
tude limit of r = 22.4 mag (of these, ~ 0.26/0.22 will be resolved/ 
unresolved, respectively). Furthermore, these background sources 
may be expected to be evenly distributed throughout the area within 
the maximum search radius, unlike the true counterparts. 

We performed the following simple checks to determine the 
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Figure 5. Histogram of the 10000 realisations of determining Qq for 6621 
randomly-positioned Herschel-ATLAS sources, designed to determine the 
Ifj uncertainty on Qo. A Gaussian distribution with median and standard 
deviation derived from the histogram is overlaid (dashed line), and the value 
of Qg"'^°'' is indicated by the vertical dotted line. 



effectiveness of the LR technique for the //er.st/;e/-ATLAS SDP 
catalogue. 



2.3.1 LR analyses of random catalogues 

As a first test of whether the LR technique produces sensible re- 
sults, we wanted to test the method in the absence of any true as- 
sociation between the 250 ^m and SDSS positions. We randomised 
the positions of the 6621 sources and re-ran the LR analysis 10000 
times, recording the derived value of Qo each time. The histogram 
of the resulting Qo distribution had a median of 0.000 with a la 
uncertainty of 0.006. In these cases, where Qo ~ 0, the values for 
L and hence R are unreliable, since the distributions of total(m) 
and real(m) that we determine are almost identical, and the latter 
is strongly affected by noise as a result (see section [Z2t . The his- 
togram of the simulated values for Qo is shown in Figure[5] with a 
Gaussian distribution with appropriate median and standard devia- 
tion overlaid (dashed line). The derived value of Qo"'^^^ determined 
in section [22\ is overlaid as the vertical dotted line. With Qo"'^°° 
residing within 2ct of the median Qo value for these random cata- 
logues, it is not clear that the population of unresolved sources is 
detected in the //er.?c/!e/-ATLAS SDP data (see section |4!4l l. How- 
ever, to avoid the possibility of missing real counterparts of poten- 
tially great scientific importance, we must not ignore the possibility 
that unresolved sources are detected in our 250 fim catalogue. 



2.3.2 LR analyses of SDSS galaxies 

We also performed a test in which we replaced the SPIRE posi- 
tions in our LR analysis with the positions of SDSS galaxies, while 
retaining the 250/im fluxes and errors in order to accurately repro- 
duce the positional uncertainties according to the method given in 
section im We found that we recovered 97.2% of the SDSS galax- 
ies at R ^ 0.8, which reduced to 93.9% when the SDSS positions 
were varied according to a Gaussian positional offset with stan- 
dard deviation appropriate for the signal-to-noise ratio of the real 
SPIRE sources (according to the rescaled formula given in section 
I2.U . Comparing this value to our overall ID rate we see that we are 
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not typically missing counterparts because we have underestimated 
their positional errors, but because of the fact that approximately 
39% of 250 /jm sources (the actual value is 1 — Qo) are not de- 
tected in SDSS r-band data down to the magnitude limit of our 
survey. 

We also compared our new catalogues with the small subsets 
of overlapping objects in the Imperial IRAS-FSC Redshift Cata- 
logue (lIFSCz, Wang & Rowan-Robinson, 2009) and FIRST sur- 
veys (Becker, White & Helfand, 1995). These comparisons are pre- 
sented in detail in appendix|A] but to summarize: 

• We find updated positions for five IIFSCz galaxies which pre- 
viously had misidentified or unidentified counterparts. 

• Our catalogue is consistent with a catalogue of FIRST radio 
sources matched to the 250 /xm sample, provided that the radio 
sources are detected in our optical data, and that the optical counter- 
parts contain only single components at moderate separations from 
the 250 f^m centroids. 

• A small collection of Spitzer Space Telescope snapshot im- 
ages taken at near infrared wavelengths reinforce our belief in the 
accuracy of our method. 



3 REDSHIFTS IN THE HERSCHEL-AThAS 9HR FIELD 

3.1 Spectroscopic Redshifts 

The GAMA catalogue (Driver et al., 2010) contains 12,626 new 
spectroscopic redshifts in the Herschel-ATLAS SDP region for 
sources satisfying the GAMA target selection criteria (including 
magnitude limits of r < 19.4, z < 18.2 &K < 17.6 - Baldry et 
al. 2010). In addition, there are a further 3281 redshifts available in 
this region from the SDSS DR7, 248 from the 2SLAQ-LRG survey 
(Cannon et al., 2006), 939 from the 2SLAQ-QS0 survey (Croom et 
al., 2009) and 29 from the 6dFGS (Jones et al., 2009). 1099 spec- 
troscopic redshifts for reliable counterparts were collated (includ- 
ing those from the SDSS DR7, 6dFGS, 2SLAQ-QSO/LRG surveys 
and the GAMA catalogue), meaning that 41.0% of our R > 0.8 
counterparts have spectroscopic redshifts (and 15.0% of all ^ 5a 
250 nm sources). We note that none of the spectroscopic catalogues 
(with the exception of the 6dFGS catalogue) extends to declinations 
less than -1 deg. The number of redshifts for reliable counterparts 
from each spectroscopic survey is presented in table|2] and the red- 
shift properties of 250 p-m selected galaxies are discussed in section 



3.2 Photometric Redshifts 

For those sources without spectra, we estimate photometric red- 
shifts using optical and near-infrared photometry. The Herschel- 
ATLAS SDP field has almost complete optical coverage in ugriz 
from the SDSS DR7 and near-infrared YJHK photometry from 
7th data release of the UKIDSS Large Area Survey (Lawrence et 
al. 2007). As well as having spectroscopy from GAMA and SDSS, 
these very wide-area surveys overlap with several deeper spectro- 
scopic surveys (Davis et al., 2003, Cannon et al., 2006, Davis et al., 
2007, Lilly et al., 2007) which allow us to construct a spectroscopic 
training set with large numbers of objects (> 1000 per bin of unit 
magnitude or 0. 1 in redshift) up to r-band magnitudes r < 23 and 
redshifts z < 1.0, i.e. to approximately the photometric depth of 
SDSS and UKIDSS-LAS. 

This large and relatively complete training set allows us to 
use an empirical regression method to estimate the photometric 



redshifts. We use the well-known artificial neural network code 
ANNZ (Collister & Lahav, 2004) with a network architecture of 
A*' : 2A'' : 2N : 1, where A'' is the number of photomet- 
ric bands used as inputs; although there are 9 photometric bands 
(ugrizYJHK) available in this case, we ignore bands where an 
object has no coverage or where the photometry is flagged as du- 
bious, and train separate neural networks for all combinations of 
bands with at least three good detections. An advantage of ANNZ 
is that it provides redshift error estimates, a^, based on the photo- 
metric errors it is supplied with; we checked that these errors were 
distributed correctly by confirming that, for a set of validation data 
with spectroscopic redshifts, (zphot — 2spec)/crz follows a Gaussian 
distribution centred on zero - however we found that the width of 
the best-fitting Gaussian was ^ 1.4, indicating that the errors were 
underestimated by this factor on average. To improve the accuracy 
of the error estimates, we used the width of this distribution to cor- 
rect the error estimates individually for each trained network, with 
correction factors of typically 1.3 to 2. 

Confirming the accuracy of empirical photometric redshifts is 
always difficult, since the objects for which we have spectroscopic 
redshifts for comparison are, by necessity, drawn from the same 
sample which is used to train the neural network, and may not be 
fully representative of the whole population of objects to which the 
method is applied. A particular concern in our case was that the 
Herschel counterparts are likely to have a different distribution in 
colour space than the training-set galaxies, and so any bias in the 
photometric redshift as a function of colour could cause the average 
redshift of Herschel counterparts to be systematically wrong. How- 
ever, we satisfied ourselves that this was not a problem by looking 
for trends in the difference between photometric and spectroscopic 
redshift as a function of colour, and finding no significant trend 
with any colour. For example, the best-fitting straight line relation- 
ship between (zphot — ^spcc) and (r — A'), for objects in our vali- 
dation dataset, has a gradient of 0.00035 - two orders of magnitude 
smaller than the scatter of (2phot — Zspcc), which for the same sam- 
ple has a standard deviation of 0.037 overall. 

In the event that the counterparts are so obscured that they are 
invisible in the optical data, they will clearly have unreliable pho- 
tometric redshifts (this is inevitable, given the small number of de- 
tections that would be available), but this scenario will become ap- 
parent as large errors on the photometric redshifts of these sources. 
In any case, these sources will not pass our r < 22.4 mag selec- 
tion criterion. With this new catalogue of photometric redshifts, all 
sources detected in at least three photometric bands have either a 
spectroscopic or photometric redshift. 



3.3 Redshift distribution and completeness of 
reliably-identified 250 pm sources 

Using a method analogous to that used in section 12.21 to calcu- 
late the r-band magnitude distribution of counterparts to 250 pm 
sources, we may determine the completeness of our reliably- 
matched 250 fj,m and SDSS objects as a function of redshift. Here 
we define the completeness as the fraction of reliably identified 
counterparts, compared with all of those counterparts that are de- 
tected in our data i.e. Qo x 6621 ~ 3925 sources. First, we use the 
star-galaxy separation method from Baldry et al. (2010 - discussed 
in section |4l4l l to ensure that only galaxies remain in our sample. 
We then use the catalogue of photometric and spectroscopic red- 
shifts discussed in section [J!2l to determine the distribution of the 
galaxy redshifts in the catalogue, n{z). Since we know that this 
catalogue covers an area of 36.0 deg^, we can scale to the total 
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Table 2. The number of spectroscopic redshifts for reliable counterparts from each survey used in our final catalogue. The majority of the spectroscopic 
redshifts used in our follow-up studies of the reliable 250 /jm associations come from GAMA. There are a total of 1099 R 55 0.80 250fiva counterparts with 
spectroscopic redshifts. 



Survey 



Reference 



Number of redshifts Percentage of 

ij ^ 0.8 Catalogue Herschel-ATLAS IDs 



2SLAQ-LRG Cannon et al. (2006) 

2SLAQ-QSO Croom et al. (2009) 

6dFGS Jones et al. (2009) 

GAMA Oliver et al. (20 1 0) 

SDSS DR7 Abazajian et al. (2009) 



3 


248 


0.12 


4 


939 


0.17 


12 


29 


0.50 


766 


12626 


31.6 


316 


3281 


13.0 





Figure 6. Top: total(z) (dotted histogram) represents the photometric red- 
shift distribution of all of the sources within 10.0 arcsec of a 250 /xm posi- 
tion, while n{z) (solid line) represents the expected background based on 
the area covered and the redshift catalogue number counts. We note that the 
histogram of background sources is consistent with the results of Oyaizu 
et al. (2008). Bottom: real{z) (solid line) shows the redshift distribution of 
the excess sources above the background around the 250 /iva centres, and 
is compared with the dotted line which shows the photometric redshift dis- 
tribution for those counterparts with R ^ 0.8 in our LR analysis. We note 
that real(^) peaks at a lower redshift than the intrinsic n{z) for SDSS galax- 
ies (solid histogram, top panel). The shaded histogram shows the number of 
spectroscopic redshifts for galaxies (as defined by the star galaxy separation 
criteria in Baldry et al.. 2010) in our sample. The percentage completeness 
in our catalogue is given in Table[3] 



sky area searched around the 250 /im sources (6621 x Tvr^^^^) to 
determine the expected background, n{z) (see Figure[6]l. Subtract- 
ing total(z) (the z distribution of all sources within 10.0 arcsec of 
the 250 /.tm centres) from n{z) then gives us the number of ex- 
cess sources around the 250 fj,m positions, real(2;), and by compar- 
ing this with the photometric redshift distribution for those sources 
with R ^ 0.8 we can estimate the completeness of our reliable cat- 
alogue as a function of z. The completeness values for our reliable 
catalogue in bins between 0.0 < z < 1.1 are presented in Table [3] 
Figure |6] also shows the redshift distribution of 250 fim- 
selected //er^c/jeZ-ATLAS sources with reliable counterparts and 



Table 3. The percentage completeness of our reliable catalogue as a func- 
tion of photometric redshift. These values are derived as described in section 
13.31 using a method analogous to that applied to determine the r-band mag- 
nitude distribution of 250 fj,ra sources. The en'ors are determined assuming 
that they are dominated by the Poissonian en'ors on real(2pijot). 



■2^phot 


Completeness (%) 


Ccomp 


O.O-O.l 


93.2 


7.5 


0.1-0.2 


83.2 


4.8 
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spectroscopic redshifts in our GAMA/SDSS DR7 catalogue. The 
redshift distribution of reliable r-band counterparts peaks at 



Zphot = 0.25 ± 0.05, with a median value of Zph, 



0.31 



-t-0.28 



or Zspcc = O.lSlQjijg if only spectroscopic redshifts are included 
(here the errors on the peak are based on the half-width of one his- 
togram bin, and those on the median values are derived according to 
the 16th and 86th percentiles of the redshift cumulative frequency 
distribution). The disparity between these median redshift values 
is to be expected since the photometric redshifts are computed to 
fainter magnitude limits than the spectroscopic redshifts have been 
measured in the GAMA 9hr catalogue. 

It is also interesting to note that the photometric redshift dis- 
tribution of excess 250 /im sources ~ real(2:phot) - peaks at a lower 
redshift than the intrinsic redshift distribution of the SDSS photo- 
metric redshift catalogue, n(2phot)- This is not due to our inabil- 
ity to reliably identify sources at higher redshifts, since the N(2:) 
includes statistical non-detections, and indeed the magnitude dis- 
tribution of the true counterparts (calculated in section |2J2j peaks 
at brighter magnitudes then the background 71(771). This determi- 
nation of real(zphot) indicates that the low redshift population of 
Herschel-KIhAS galaxies in our 5a sample is generally at lower 
redshift than the average SDSS galaxy, raising an interesting ques- 
tion about the ~ 40% of Herschel-ATLAS sources which do not 
have a counterpart above the SDSS DR7 limit. The redder sub-mm 
colours of these blank field sources suggests that they are at much 
higher redshifts (see Figure [7] and Section HTt . and furthermore, 
the study of //eri'c/7eZ~ ATLAS colours by Amblard et al. (2010) in- 
dicates a second population of sources at 2; ~ 2. The fact that we do 
not see a rising n{z) for Herschel-ATLAS sources in SDSS out to 
the SDSS limit suggests that the total Herschel-ATLAS n{z) is bi- 
modal, with a low-redshift peak at 2 ~ 0.35±0.05 (where the error 
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Figure 7. Histograms of the SPIRE colours for those sources with reliable 
galaxy counterparts (R ^ 0.8, shaded black), those for which the most re- 
liable candidate has R < 0.8 (shaded red), and those remaining sources in 
the MAD-X catalogue without any SDSS r-band counterparts. All sources 
are detected at Js 5cr in the 250 /^m band, and ^ 3(7 in the 350 /im band. 
Those galaxies for which we identify R ^ 0.8 counterparts have consider- 
ably bluer average S250 / S350 colours than those for which we are unable 
to identify a counterpart. 



is once more derived according to the width of one histogram bin) 
and a higher-redshift peak at z > 1. Such behaviour is predicted in 
several models of sub-mm galaxy populations (e.g. Lagache et al. 
2004; Negrello et al. 2007, Wilman et al., 2010) and also suggested 
by stronger clustering in samples of Herschel-AThAS galaxies se- 
lected to have redder far-infrared colours (Maddox et al. 2010), as 
well as the steep upturn in the Herschel-KIhAS number counts 
at fluxes below 100 mjy (Clements et al. 2010) and the results of 
BLAST, which include deeper optical samples with fainter spec- 
troscopy, albeit with smaller object samples by more than an order 
of magnitude (Dunlop et al. 2010 and Chapin et al. 2010). 



at ^ 5a in the 250 /im band, and ^ 3a in the 350 /im band. 
The sources have been divided into three sub-sets; those sources 
with reliable r-band counterparts classified as galaxies (R ^ 0.8, 
black shaded histogram), those for which the most reliable candi- 
date has 7? < 0.8 (red shaded histogram) and all of the remain- 
ing sources in the MAD-X catalogue without any r-band coun- 
terparts (grey shaded histogram). The median SPIRE colours for 
the three samples are quite different, with median colours of S250/ 
S350 ~ 1.51, 1.23, & 1.16 for the three respective samples. It 
is clear that the R ^ 0.8 sources are considerably bluer than 
the SPIRE sources without (reliable) counterparts, and a series 
of Kolmogorov-Smimov tests confirms that no two of the three 
sets of histograms are drawn from the same parent distribution at 
> 99.9999% confidence. 

The differences between the colour populations may be due to 
those sources without reliable counterparts residing at higher red- 
shift than those for which we can identify reliable counterparts, 
causing the peak of each such source's far-infrared spectral energy 
distribution to move to longer wavelengths. 

It is also clear that those sources for which the most reliable 
candidate counterpart has i? < 0.8 are a different population from 
those for which we identify no r-band counterparts. Half of these 
sources can be explained by the expected number which are above 
the SDSS limit but for which we cannot determine a reliable coun- 
terpart, and the other half simply have an unrelated SDSS r-band 
source within the search radius. This is reflected in the histogram 
for these sources having colours intermediate between the reliable 
counterparts and the "no potential counterparts" samples - it con- 
tains roughly equal fractions of both types of object (presumably 
high and low redshift). 

We also compare the far-infrared colours with the results of 
Amblard et al. (2010 - their Figure 1 and our Figure |9j. In this 
figure (which uses the same colour scheme as Figure |7] with the 
R ^ 0.8 counterparts in black) we consider only those sources at 
^ 5cr in the 250 /im and 350 fim bands and ^ 3a in the 500 ^m 
bands to ensure a fair comparison. We identify 133 such sources 
with R ^ 0.8 counterparts in our SDSS r-band catalogue. 



4 RESULTS 

The resulting values for the likelihood ratio and reliability for the 
6621 5ct sources in the 250 /im selected catalogue are shown in 
Figure|4] Approximately 58% of galaxies with S250Mm ^ 32mJy 
are detected in our r ^ 22.4 mag SDSS catalog, and of those we 
identify 2423 counterparts with reliability R ^ 0.8, which we con- 
sider robust. Of these reliable counterparts, 1 252 also have GALEX 
detections in at least one ultraviolet band, and each source has ei- 
ther a reliable spectroscopic redshift (1099 galaxies) or photometric 
redshift. 

Figure [8] shows the fractional completeness in our identifica- 
tion catalogue as a function of the 250 /im flux, and of the SDSS 
DR7 r-band magnitude of the counterparts. The shaded areas indi- 
cate the la uncertainty on the completeness derived from the Pois- 
son errors on the number of sources brighter than a given magni- 
tude/flux. 



4.1 The sub-millimetre colours of SPIRE sources 

In Figure |7] we display the 5*250 /•S'350 colours of the 250 ^m 
sources from the Herschel-AThAS SDP catalogue with detections 



4.2 Lensed sources in //-ATLAS 

Wide-field sub-millimetre wavelength surveys such as the 
Herschel-KIhAS are particularly well-suited to detecting large 
numbers of strongly-lensed sources (e.g. Blain, 1996, Negrello et 
al., 2007), in which intrinsically faint distant galaxies may be mag- 
nified by an otherwise unrelated foreground massive object along 
the line of sight (e.g. galaxy, galaxy cluster), and observed at more 
readily-detectable flux densities. Strong lensing can not only am- 
plify the brightness of these distant soures, but also increase their 
angular size, allowing galaxies to be studied on scales smaller than 
would otherwise be possible, making samples of strongly-lensed 
galaxies an important cosmological probe (e.g. Swinbank et al. 
2010). At bright 500 ^m flux densities (> 100 mJy), after remov- 
ing very local galaxies and blazars from the source counts, the sur- 
face density of sources on the sky is dominated by strongly-lensed 
galaxies, with large-area surveys such as Herschel-ATLAS re- 
quired to detect them due to their paucity on the sky (~ 0.5 deg~^). 
This method of selecting lensed sources has one huge advantage 
over other methods, in that the selection efficiency is almost 100 
percent (Negrello et al., 2010). 

Of particular interest in Figure |9] is the positioning of the 58 
R ^ 0.8 galaxies with 5250/5350 ^ 1.5. The redshift loci of 
Arp220 and M82 templates from Silva et al. (1998), shown in green 



Herschel-ArM5.- Counterparts 1 1 




100 

S250 (mJy) 



500 




18 20 

SDSS DR7 r mag 



Figure 8. Left: Completeness of our identification catalogue as a function of 250 ^m flux. The ordinate indicates the fraction of 250 ^m sources brighter than 
the flux given by the abscissa, for which we have reliably identified counteiparts in the SDSS DR7 r-band data. We can reliably identify the counteiparts to 
36.6 percent of the 6621 sources in our 5o- 250/.tm-selected catalogue down to the limit of our r-band SDSS DR7 catalogue. Right: The fraction of sources 
with statistical identifications in our SDSS DR7 r-band catalogue (i.e. real(r?i)) which can be reliably identified with R > 0.80. not accounting for the value 
of Qo- We reliably identify ~ 63% of those counterparts that are detected in our r < 22.4 mag survey data. The shaded regions indicate the Icr uncertainties 
in these completeness fractions, determined based on the Poisson errors on the number counts. 
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Figure 9. SPIRE colour-colour diagram, showing the colours of 5(t 250 /jm 
and 350 /im sources with > 3cr detections in the 500 /^m band. The colour 
scheme is as in Figure|2] The green and blue lines represent redshift colour 
tracks between 0.0 < z < 5.0 based on Arp220 and M82 template SEDs 
from Silva et al., 1998, with solid circles along the tracks indicating the lo- 
cations in colour-colour space of integer redshifts between these two values 
for that template. 



and blue respectively, suggest that sources with such colours reside 
at 2 > 1.0, despite the photometric and spectroscopic redshifts of 
their reliable optical counterparts residing sX z < 1.0 (Figure [6](. 
This disparity is, for some of these sources, caused by the blending 
of galaxies in the 350/500 /xm bands (the results of Rigby et al., 
2010, suggest that the extracted 500 /.im flux densities of more than 
a quarter of > 5ct sources are enhanced by factors of up to ^2 due 
to multiple sources residing within a beam, for example). How- 
ever, it is also possible that some of these are intrinsically high- 
redshift far-infrared sources which are strongly lensed by low- 
redshift foreground galaxies. The models of Negrello et al. (2007) 
predict that the fraction of lensed sources to these sensitivity limits 
is ~4%. In our catalogue, ~14% of //erac/jeZ- ATLAS sources with 
low-redshift R ^ 0.8 counterparts have S'250/5'350 ^ 1.5, consis- 



tent with high-z galaxies (209 out of 1480 sources that are detected 
at ^ 5ct at 250 and 350 /im and ^ 3a at 500 /im). These num- 
bers suggest that approximately one third of these sources may be 
strongly-lensed galaxies, although more realistic simulations will 
be required to thoroughly test this interesting hypothesis. 



4.3 Multiple sources 

We may also use our catalogue to identify 250 /im sources with 
multiple counterpart galaxies, by considering those which have at 
least one L > 5.0, R < 0.80 optical source within 10 arcsec 
(of course, this will also select low-probability superpositions of 
sources on the sky, e.g. Arp, 1967). There are a total of 118 such 
250 fim sources which have at least one counterpart with L > 5.0 
and R < 0.8 in our catalogue. It is possible that these sources 
contain multiple interacting counterparts, and indeed four of these 
sources have at least two counterparts with spectroscopic redshifts 
with Az < 0.001 (including one of the radio sources mentioned in 
appendix |A2l H-ATLAS 1090631.3-^004605). SDSS three-colour 
images of each spectroscopically-confirmed galaxy interaction are 
shown in Figure [lO] There may be further examples for which we 
do not have spectroscopic redshifts. 

For a more "complete" sample of cross-identifications, 
sources above some threshold in L could be considered, however in 
this case there is no immediate information to decide which of the 
multiple counterparts contributes most to the SPIRE flux, without 
resorting to priors on e.g. the colours of sources (e.g. Roseboom 
et al., 2009). This is work which we are pursuing and will look to 
implement in our next data release. 

Finally, there is one additional source (H-ATLAS 090130.2- 
00215) with two high LR counterparts that have differing spectro- 
scopic redshifts. This 250 /tm source has counterparts with L = 
15.8 & 35.0, residing at z^pac = 0.196 and Zspoc ~ 0.255, re- 
spectively. The latter counterpart is also a P < 0.20 radio source, 
mentioned in appendix IA2I and presumably constitutes one of the 
low-probability superpositions mentioned above. We also note that 
merging sources may have real positional offsets between the dust 
emission in the far-infrared and the starlight which dominates the 
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Figure 10. SDSS gri colour images centred on the positions of four SPIRE 250 ^m 5(t sources witli at least one counterpart with L > 5.0 and i? < 0.8, and 
with at least two spectroscopic redshifts within Az = 0.001 of each other (indicated by the red dashed circles). Each image is 40 arcsec on a side orientated 
such that North is up and East is to the left. 



optical (see e.g. Zhu et al., 2007, Ivison et al., 2008, or Smith et al., 
2010a). 



4.4 Unresolved sources 

Although the main focus of this paper is the reliable identifica- 
tion of galaxies selected at 250 /^m, we have also applied the LR 
method separately to identify any reliable unresolved sources. We 
have used our spectroscopic data set to further split the population 
of unresolved sources in to groups of candidate stars and QSOs. 
There are a total of five R ^ 0.80 unresolved sources in the 250 ^m 
selected sample (green asterisks in Figure|2j, of which three occupy 



the stellar colour-colour locus, while two have colours or spectro- 
scopic redshifts consistent with being QSOs. 

Studying these objects in detail is beyond the scope of this 
paper, but see e.g. Thompson et al., (2010) for a discussion of stel- 
lar sources in the //eric/ieZ-ATLAS and the identification of two 
candidate debris disks. 



5 CONCLUSIONS 

We have demonstrated that the likelihood ratio method of Suther- 
land & Saunders (1992) is an appropriate way to determine reli- 
able counterparts for 250 /im-selected galaxies from the Herschel- 
ATLAS science demonstration phase observations in the SDSS 
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DR7 r-band observations of the GAMA 9 hour field. We have 
determined reliable (-R ^ 0.8) counterparts to 2423 out of 6621 
sources detected at a SNR ^ 5, and found that ~59.3% have coun- 
terparts brighter than r = 22.4 (the limit of our catalogue). We iden- 
tify reliable counterparts to 36.6% of the 250/im sources (2423 out 
of 6621), and our calculations in section |Z2l suggest that our sam- 
ple is 61.8% complete down to the SDSS r-band limit of our cat- 
alogue, in the sense that we have reliably identified 2423 counter- 
parts out of the Qo X 6621 ~ 3925 counterparts that are actually 
detected in the SDSS DRV data. 

We show from a consideration of their sub-mm colours that 
those sources without optical counterparts appear to reside at higher 
redshifts than those with optical counterparts in our available ancil- 
lary data. We compute the completeness of our reliable catalogue as 
a function of redshift, and find that Herschel-AThAS sources with 
SDSS counterparts have a lower median redshift than the general 
SDSS population, suggesting a bimodal n{z) for //er.st7ie/-ATLAS 
sources. For this bimodal n{z), we find that the lower redshift pop- 
ulation has a median redshift of z = 0.401q jg (with the errors 
calculated according to the 16th and 86th percentiles of the red- 
shift cumulative frequency distribution), and that the high redshift 
population peaks ai z > 1. We also find evidence for a popula- 
tion of sub-millimetre-selected interacting galaxies, and suggest a 
possible method for selecting samples of strongly-lensed galaxies. 
Finally, we find five new positions for IRAS~FSC/lYFSCz sources 
based on our LR analysis and higher-resolution PACS and SPIRE 
data. 

The UV/optical/near-infrared identifications to the 250 /im- 
selected sample, as well as their photometric and spectroscopic 
redshifts, are available for download from the //er.st/;e/-ATLAS 
webpage; |http: / /www .h- at las ■ org| 
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APPENDIX A: 
PROCESS 



CHECKING THE IDENTIFICATION 



Al IRAS sources 

In the 9hr field SDP region, there are a total of 35 detections 
from the Imperial 1RAS~FSC Redshift Catalogue (IIFSCz, Wang 
& Rowan-Robinson, 2009, building on the IRAS Faint Source Cat- 
alogue of Moshir et al. 1992), the majority with associated optical/ 
near-infrared positions of high reliability. By matching our 250 nm 
selected catalogue with the IIFSCz, and comparing the results to 
our LR analyses, we can provide a first check on the accuracy of 
our associations. There are 30 sources in the IIFSCz that have cata- 
logue positions within 10.0 arcsec of the 250 /im source positions. 
Each of the positions in the IIFSCz catalogue for these sources 
matches an SDSS DR7 position within 2 arcsec, and has reliability 
R > 0.80. 

There remain five IIFSC2: sources for which we do not recover 
SDSS/SPIRE matches within lO.O arcsec. In Figure IaTI we show 
greyscale images of the Herschel-ATLAS PACS 100 /im observa- 
tions of the regions surrounding these five IIFSC2: sources, cen- 
tred on the quoted IIFSC2: catalogue positions. It is clear that each 
IIFSC2: source has a bright PACS detection less than one arcminute 
away, with the PACS 100 /.tm observations being of considerably 
higher sensitivity and resolution than that of IRAS at 60 /im (this is 
the band on which the IIFSCz is selected). 

Here we discuss each of these sources individually. 

• F08555-I-0145: This source has an IIFSC2: position derived 
from the SDSS DR6, residing approximately 40 arcsec away from 
the IRAS centroid. The bright PACS/SPIRE source within the la 
positional errors of the IRAS centre is associated with an r = 
17.9 mag galaxy approximately I arcsec away, which is not in the 
SDSS primary photometry catalogue. It is clear that this is the cor- 
rect association, with a reliability based on its newly-measured 
magnitude and separation of R = 0.999, and that this source was 
mis-identified in the IIFSC^. 

• F08598-0103: IIFSCz contains only an IRAS-derived posi- 
tion for this source in the absence of any counterparts detected in 
the ancillary data available at the time. Using our higher-resolution 
PACS/SPIRE observations, we are able to identify the optical coun- 
terpart, approximately 30 arcsec away from the IIFSC^ position, 
with R = 0.999. 

• F08599-I-0139: The IIFSCz position for this source is also de- 
rived from the SDSS DR6, suggesting a source approximately 10 



arcsec to the North of the IRAS position. Herschel-AT'LAS data, 
however, reveal a bright sub-millimetre source ^--^30 arcsec to the 
East, associated with an r-band counterpart at _R = 0.994. This 
source was therefore also mis-identified in the IIFSC2 catalogue. 

• F09009-0054: The IIFSC2: catalogue position for this source 
was derived using data from the NRAO VLA Sky Survey (NVSS, 
Condon et al., 1998), and resides within the extended stellar halo 
of the z = 0.04 galaxy 2MASX J09033081-0106127 in the SDSS/ 
UKIDSS-LAS data (which is also detected in each of the PACS and 
SPIRE bands). The higher resolution of the SDSS DR7 catalogue 
compared with the NVSS data and IRAS positions enables us to 
derive a more accurate position for the counterpart to this source, 
with R = 0.999. 

• F09047-0040: The PACS/SPIRE detection of this source is lo- 
cated approximately 40 arcsec away from the IRAS position quoted 
in the catalogue. We identify an SDSS DR7 optical counterpart 
with a more accurate position, and reliability R — 0.999. 

Table I A l] contains new positions for our reliable counterparts 
to these IRAS sources. Assuming that our new identifications to the 
IRAS sources are correct, we recover reliable counterparts (with ac- 
curate positions) to all of the IIFSC2: sources at R ^ 0.8, as com- 
pared with ~89 percent (3 1/35) of sources for the IIFSC2; itself (we 
exclude the two sources with clearly mis-identified counterparts, 
and also the two sources with no identified counterparts). 



A2 A comparison with radio observations 

We also compared the results of our likelihood ratio analysis to 
data from the Faint Images of the Radio Sky at Twenty centime- 
tres (FIRST) Survey (Becker, White & Helfand, 1995). The FIRST 
survey covers 9,000 square degrees of sky with a resolution of 5 
arcsec, with a source density of approximately 90 per square de- 
gree brighter than the detection threshold of 1 mjy. At these rel- 
atively bright flux limits, the source population is dominated by 
Active Galactic Nuclei (AGN) rather than star-forming sources 
(e.g. Wilman et al., 2008); as a result the overlapping population 
of sources between the Herschel-AThAS and FIRST catalogues is 
not expected to dominate the number counts. 

To make the comparison between our LR analysis and FIRST 
sources, we used the frequentist identification procedure of Downes 
et al. (1986), commonly used to quantify the formal significance 
of possible counterparts to sub-millimetre galaxies in radio survey 
data (e.g. Lilly et al., 1999, Ivison et al. 2007). In this procedure, the 
statistic used to assess the probability that a nearby radio source is 
not associated with the SPIRE source is S = vrr^ x n(> F), where 
r is the angular distance between the SPIRE source and the radio 
source, F is the flux density of the radio source, and n(> F) is 
the surface density of radio sources with flux densities greater than 
this. For each SPIRE source, we looked for radio sources in the 
FIRST catalogue within 10.0 arcsec, and treated the radio source 
with the lowest value of 5* (Smin) as the one most likely to be asso- 
ciated with the SPIRE source. We used a Monte-Carlo simulation 
(e.g. Eales et al. 2009) to determine the probability distribution of 
Smin on the null hypothesis that there are no genuine associations 
between radio sources and SPIRE sources. 

We then used this probability distribution to determine the 
probability that each measured value of Smin would have occured 
by chance. We call this probability P' . Of the 6621 Sa 250 pim 
SPIRE sources, 105 have radio counterparts within 10.0 arcsec, all 
with values of P' < 0.002. However, this does not take account 
of the fact that with such a large sample of SPIRE sources one 
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Figure Al. PACS 100 /xm greyscale cutout images showing the regions surrounding the five IIFSC2 catalogue positions for which we do not find a match 
within 10.0 arcsec in our SDSS/SPIRE catalogue. The llFSCz catalogue positions ai'e denoted by a black cross (2 derived from SDSS positions, 1 from 
NVSS and 2 from the original FSC), with the SDSS DR7 r-band contours overlaid in blue and the /SAS-FSC \u error ellipse overlaid in dashed red. The 
white crosses denote the positions of the R J5 0.8 SDSS DR7 counterparts from our likelihood ratio analysis. These sources are discussed in more detail 
in section IaTI Using our SDSS DR7 likelihood ratio analysis and the higher-resolution SPIRE 250 /^m positions as our starting point, we are able to derive 
R ^ 0.8 counterparts for four of the five IIFSC^ sources, positions of which are given in Table lAl] The exception is F08555+0145, for which the bright galaxy 
approximately centred on the PACS 100 /^m source is not present in the SDSS DR7 primary photometry catalogue (however we include a manually-measured 
position in Table lAlV 

Table Al. Updated positions of the IIFSC^ sources, which were previously mis-identified, or identified with only NVSS//SA5 positions in Wang & Rowan- 
Robinson (2009). The position angles of the IRAS positional error ellipses were orientated 107° East of North. 
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expects to find some low values of P' even if there were no gen- 
uine associations between the SPIRE sources and FIRST objects. 
We used a Monte-Carlo simulation to determine that 15 of the 105 
associations are likely to be spurious. To correct for this, we calcu- 
lated a new probability for each association, P — aP' , where a 
is a constant that we calculated using "^^i ctPi = 15. We took the 
conservative decision to treat associations with P < 0.2 as coun- 
terparts which are likely to be genuine, which rejected 29 of the 
original 105 associations. 

There were a total of 76 SPIRE sources with P < 0.20 coun- 
terparts, and each of these was scrutinised using the FIRST and 



SDSS images displayed side-by-side with the Downes et al. and 
LR analysis overlaid. In this manner, we compared the results of 
the two independant identification methods. In forty-two cases, the 
P < 0.20 radio counterpart is also identified as having R ^ 0.80 
in the r-band data, and the two methods choose the same counter- 
part. 



There are thirty SPIRE sources with high quality (P < 0.20) 
FIRST counterparts which we do not recover in our LR analysis, in- 
cluding twenty-three SPIRE sources which do not have any r-band 
counterparts in our SDSS DR7 data (presumably distant, optically- 
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faint radio sources). Of the remaining seven sources with P < 0.20 
FIRST counterparts: 

• Four counterparts are detected in the optical data but have 
low reliabilities due to their faint magnitudes, or large separations 
in comparison to the value of (jpos derived based on the 250 ^m 
source SNR. 

• Two sources have multiple, possibly interacting components 
with L > 10.0 but R < 0.8, only one of which is a radio source 
(these sources are discussed in section l43] l. 

• In one further instance, the radio source has a double-lobed 
structure (a so-called FR-II, following Fanaroff & Riley, 1974), not 
coincident with either the dust emission or the starlight in the plane 
of the sky. The lobes of this FRII are extremely bright; as a result, 
the P statistic suggests that there is a low probability of a chance 
association, even though the separation between the SPIRE posi- 
tion and the FIRST centroid is large. The LR technique identifies 
the apparent host galaxy - aligned at the centre, between the two 
luminous radio jets - as having L = 0.0 due to its large separation 
(~ 9 arcsec) from the SPIRE centroid; this is an example of the 
limitations of the Downes et al. method. 

However, these possibilities do not contaminate the 250 /im 
selected sample with incorrect associations. There are however, 
four instances where distinct counterparts have P > 0.20 and 
R ^ 0.80; here, the opposite is potentially true and the two meth- 
ods conflict. These sources have derived reliabilities of 0.87, 0.98, 
0.81 and 0.93 as compared with distinct Downes et al. counterparts 
with P statistics of 0.08, 0.07, 0.02 and 0.19, respectively. These 
sources are shown in Figure lA2l in which the 10.0 arcsec search ra- 
dius centred on the 250 /im position is shown in red, any unreliable 
optical counterparts in black, the reliable optical ID in light blue, 
and the radio contours overlaid in royal blue. In two of the four 
cases, the additional sources implied by the radio data are visible in 
_K's-band observations from VIKING (Sutherland, 2009), indicat- 
ing that these sources are not merely effects of the larger positional 
uncertainty in FIRST as compared with SDSS. Futhermore, three of 
the four sources have SPIRE colours S25o/5'35o ^ 1.5, suggesting 
high redshifts (z > 1) or cold dust temperatures, with the former 
being at odds with the photometric redshifts of their most reliable 
counterparts (z < 0.55). Sources with similar SPIRE colours and 
low-redshift counterparts are discussed in more detail in section 

Finally, we note that probabilistic arguments such as those dis- 
cussed here will inevitably present apparent disagreements for a 
small number of sources within large samples. In the remaining 
101 out of 105 cases however, the results of our LR analysis are 
consistent with those using the FIRST catalogue and the P statis- 
tic, and crucially we recover an additonal 2,348 counterparts, com- 
pared with 3 1 extra counterparts to the 250 ^m sources obtained by 
using only the radio data. 



alogue used for the identification process, as a visual check on the 
effectiveness of the LR technique. 

There are a total of 49 sources that have Spitzer data, and al- 
though these data vary in sensitivity, there is no evidence that would 
suggest a mis-identification from the r-band catalogue. Such in- 
dications of wrong IDs would include reliable (R > 0.8) r- 
band counterparts indicated for SPIRE sources which have pre- 
viously unrevealed bright Spitzer sources nearer to the centre of 
the SPIRE centroid. Indeed, in one case in particular (H-ATLAS 
J090913.2-I-012I11), the sensitive IRAC data reveal the power of 
the LR technique. Although there are three potential counterparts 
in the SDSS DR7 r-band catalogue all within 6 arcsec of the SPIRE 
centroid, they have all been given low reliability (_R ^ 0.30, and 
also L ^ 0.20). The IRAC 3.6 /im data reveal a fourth candidate 
counterpart within 1 arcsec of the SPIRE position, which is pre- 
sumably the true counterpart. The r-band and IRAC 3.6 /^m data 
are presented in Figure |A3] with the various source positions over- 
laid to demonstrate the robustness of the LR method for this partic- 
ular source, but also the need for longer-wavelength observations in 
order to be able to reliably identify the counterparts to higher red- 
shift sources. The forthcoming data from the VISTA Kilo-degree 
INfrared Galaxy (VIKING) survey and from the Wide-field Infra- 
red Survey Explorer (WISE - Duval et al., 2004) satellite will en- 
able this. 

This example also highlights one crucial advantage of using 
the LR technique for Herschel surveys rather than opting simply for 
the Downes method; the LR method takes into account the fact that 
not every source has a counterpart that is brighter than the detection 
limit in ancillary survey data. 



A3 Spitzer observations 

An additional check on the identification process was conducted 
by searching for mid-infrared data from the Spitzer Space Tele- 
scope heritage archive, in order to compare the reliabilities from 
our r-band catalogue with near- and mid-infrared images between 
3.6 and 160/im. Four sets of observations were found which over- 
lapped with the Herschel-KIhAS SDP observations These data 
can be used to examine the regions surrounding the SPIRE IDs 
for additional sources which may not be present in the r-band cat- 
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Figure A2. Cases in which the LR method applied to the SDSS DR7 data and the P-statistic (Downes et al. 1986) method applied to the FIRST data produce 
different robust counterparts to 250 /im sources. In each panel, the 10.0 ai'csec search radius around the 250 /im position is shown in red, with any unreliable 
counterparts curled in black. Reliable counterparts from the LR analysis are circled in Hght blue, while the royal blue contours reveal the FIRST counteipaiL 
The P statistic for the FIRST source and the value of the reliability, R, of the most reliable SDSS DRV counterpart are given in the subfigure captions for each 
250 /im object. Thumbnail images are orientated such that North is up and East is to the left. 
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Figure A3. SDSS r-band (left) and Spitzer Space Telescope IRAC 3.6 ^m image of the region surrounding source H-ATLAS J090913.2+012111, at 
a =137.305. 5 = 1.3532 (position shown by the red circle, which has a radius of 10.0 arcsec). The r-band image contains three sources (blue 2 arcsec 
circles) that are identified as potential counterparts to the SPIRE source, with reliabilities of 0.00, 0.29, and 0.00 (and L = 0.00, 0.17 and 0.00) for the sources 
labelled A, B and C, respectively. The IRAC 3.6 /xm channel image (right) reveals an additional source within 1 ai'csec of the SPIRE centroid (dotted Hght 
blue 2 arcsec radius circle). The low reliabilities associated with the r-band detections indicates the power of the LR technique in this context. Images are 
orientated such that North is up and East is to the left. 



